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Abstract 

The multiple measurement vector (MMV) problem addresses the identification of unknown input vectors 
that share common sparse support. Even though MMV problems have been traditionally addressed within the 
context of sensor array signal processing, the recent trend is to apply compressive sensing (CS) due to its 
capability to estimate sparse support even with an insufficient number of snapshots, in which case classical 
array signal processing fails. However, CS guarantees the accurate recovery in a probabilistic manner, which 
often shows inferior performance in the regime where the traditional array signal processing approaches 
succeed. The apparent dichotomy between the probabilistic CS and deterministic sensor array signal processing 
has not been fully understood. The main contribution of the present article is a unified approach that unveils 
a missing link between CS and array signal processing. The new algorithm, which we call compressive 
MUSIC, identifies the parts of support using CS, after which the remaining supports are estimated using 
a novel generalized MUSIC criterion. Using a large system MMV model, we show that our compressive 
MUSIC requires a smaller number of sensor elements for accurate support recovery than the existing CS 
methods and that it can approach the optimal Zo-bound with finite number of snapshots. 
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I. Introduction 

Compressive sensing (CS) theory [1-3] addresses the accurate recovery of unknown sparse signals from 
underdetermined hnear measurements and has become one of the main research topics in the signal processing 
area. Compressive sensing has had a significant impact on many applications, such as magnetic resonance 
imaging [4-6], x-ray computed tomography [7], communication [8], remote sensing [9], etc. Most of the 
compressive sensing theories have been developed to address the single measurement vector (SMV) problem 
[1-3]. More specifically, let m and n be positive integers such that m < n. Then, the SMV compressive 
sensing problem is given by 

(PO) : minimize ||x||o (I.l) 

subject to b = As., 

where b e M"*, A G M"*^", x e M", and ||x||o denotes the number of non-zero elements in the vector x. 
Since (PO) requires a computationally expensive combinatorial optimization, greedy methods [10], reweighted 
norm algorithms [11, 12], convex relaxation using li norm [2, 13], or Bayesian approaches [14, 15] have been 
widely investigated as alternatives. One of the important theoretical tools within this context is the so-called 
restricted isometry property (RIP), which enables us to guarantee the robust recovery of certain input signals 
[3]. More specifically, a sensing matrix A e M™^" is said to have a fc-restricted isometry property(RIP) if 
there is a constant < Sk <1 such that 

(l-4)||xf <||Axf <(l + 4)||xf 

for all X e M" such that ||x||o < k. It has been demonstrated that d2k < — 1 is sufficient for Ii/Iq 
equivalence [2]. For many classes of random matrices, the RIP condition is satisfied with extremely high 
probability if the number of measurements satisfies m> ck \og{n/k) for some constant c > [3]. Ever since 
the pioneering work by Candes, Romberg, and Tao [2] was pubUshed, many important theoretical discoveries 
have been made. For example, the necessary and/or sufficient conditions for the sparse recovery by maximum 
likelihood method [16], p-thresholding [16], and orthogonal matching pursuit [17] have been extensively 
studied. Furthermore, the geometry of li recovery has been revealed using the high dimensional polytope 
geometry [18]. A recent breakthrough in SMV compressive sensing is the discovery of an approximate 
message passing algorithm [19] that has striking similarity with the iterative thresholding method [20], 
while achieving theoretical optimaUty. 

Another important area of compressive sensing research is the so-called multiple measurement vector 
problem (MMV) [21-24]. The MMV problem addresses the recovery of a set of sparse signal vectors that 
share conmion non-zero support. More specifically, let m, n and r be positive integers such that m < n. In 
the MMV context, m and r denote the number of sensor elements and snapshots, respectively. For a given 
observation matrix B e W^'''', a sensing matrix A e M"^" such that B = AX^ for some e M"^'', the 
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multiple measurement vector (MMV) problem is formulated as: 



minimize ll-'^Ho 



(1.2) 



subject to B = AX, 



where X = [xi,--- ,Xr] e M"^'' and ||X||o = |suppX|, where suppX = {1 < i < n : x* ^ 0} and 
x' is the i-th row of X. The MMV problem also has many important applications such as distributed 
compressive sensing [25], direction-of-arrival estimation in radar [26], magnetic resonance imaging with 
multiple coils [27], diffuse optical tomography using multiple illumination patterns [28, 29], etc. Currently, 
greedy algorithms such as S-OMP (simultaneous orthogonal matching pursuit) [21,30], convex relaxation 
methods using mixed norm [31,32], M-FOCUSS [22], M-SBL (Multiple Sparse Bayesian Learning) [33], 
randomized algorithms such as REduce MMV and BOost (ReMBo)[23], and model-based compressive 
sensing using block- spar sity [34, 35] have also been applied to the MMV problem within the context of 
compressive sensing. 

In MMV, thanks to the common sparse support, it is quite predictable that the recoverable sparsity level 
may increase with the increasing number of measurement vectors. More specifically, given a sensing matrix 
A, let spark(A) denote the smallest number of linearly dependent columns of A. Then, according to Chen 
and Huo [21], Feng and Bresler [36], if X e R"^'' satisfies AX = B md 



then X is the unique solution of (1.2). In (1.3), the last inequahty comes from the observation that rank(B) < 
||-'^*||o := |suppX*|. Recently, Davies and Eldar showed that (1.3) is indeed a necessary codition for X 
to be a unique solution for AX = B [37]. Compared to the SMV case (rank(B) = 1), (1.3) informs us 
that the recoverable sparsity level increases with the number of measurement vectors. Furthermore, average 
case analysis [38] and information theoretic analysis [39] have indicated the performance improvements of 
MMV algorithms with an increasing number of snapshots. However, the performance of the aforementioned 
MMV compressive sensing algorithms are not generally satisfactory, and significant performance gaps still 
exist from (1.3) even for a noiseless case when only a finite number of snapshots is available. 

On the other hand, before the advance of compressive sensing, the MMV problem (1.2), which was often 
termed as direction-of-arrival (DOA) or the bearing estimation problem, had been addressed using sensor 
array signal processing techniques [26]. One of the most popular and successful DOA estimation algorithms 
is the so-called the MUSIC (Multiple Signal Classification) algorithm [40]. MUSIC first calculates the 
signal subspace and noise subspace by decomposing the empirical covariance matrix; then, by exploiting 
the orthogonahty between the noise subspace and signal manifold at the correct target locations, MUSIC 
identifies the target locations. The MUSIC estimator has been proven to be a large snapshot (for r ^ 1) 
reahzation of the maximum hkehhood estimator for any m > fc, if and only if the signals are uncorrelated 



ll^llo< 



spark(A) -I- rank(B) - 1 
2 



< spark(A) - 1 



(1.3) 
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[41]. As will be shown later when rank(_B) = k and the row vectors X are in general position, the maximum 
sparsity level that is uniquely recoverable using the MUSIC approach is 

||X||o < spark(A) - 1 , (1.4) 

which implies that the MUSIC algorithm achieves the Iq bound (1.3) of the MMV when rank(_B) — k. 
However, one of the main limitations of the MUSIC algorithm is its failure when rank(_B) < k. This 
problem is often called the "coherent source" problem within the sensor array signal processing context 
[26]. For example, MUSIC cannot identify any target with a single snapshot, whereas the compressive 
sensing approaches can identify the location with extremely large probability. 

To the best of our knowledge, this apparent "missing link" between compressive sensing and sensor array 
signal processing for the MMV problem has not yet been discussed. The main contribution of the present 
article is, therefore, to provide a new class of algorithms that unveils the missing link. The new algorithm, 
termed compressive MUSIC (CS-MUSIC), can be regarded as a deterministic extension of compressive 
sensing to achieve the optimality, or as a generalization of the MUSIC algorithm using a probabilistic 
setup to address the difficult problem of the coherent sources estimation. This generalization is due to 
our novel discovery of a generalized MUSIC criterion, which tells us that an unknown support of size 
rank(_B) can be estimated deterministically as long as a fc — rank(_B) support can be estimated with 
any compressive sensing algorithm such as S-OMP or thresholding. Therefore, as rank(_B) approaches 
k, our compressive MUSIC approaches the classical MUSIC estimator; whereas, as rank(i?) becomes 
1, the algorithm approaches to a classical SMV compressive sensing algorithm. Furthermore, even if the 
sparsity level is not known a priori, compressive MUSIC can accurately estimate the sparsity level using 
the generalized MUSIC criterion. This emphasizes the practical usefulness of the new algorithm. Since the 
fraction of the support that should be estimated probabilistically is reduced from /c to fc — rank(_B), one can 
conject that the required number of sensor elements for compressive MUSIC is significantly smaller than that 
for conventional compressive sensing. Using the large system MMV model, we derive explicit expressions for 
the minimum number of sensor elements, which confirms our conjecture. Furthermore, we derive an explicit 
expression of the minimum SNR to guarantee the success of compressive MUSIC. Numerical experiments 
confirm out theoretical findings. 

The remainder of the paper is organized as follows. We provide the problem formulation and mathematical 
preliminaries in Section 11, followed by a review of existing MMV algorithms in Section 111. Section IV 
gives a detailed presentation of the generalized MUSIC criterion, and the required number of sensor elements 
in CS-MUSlC is calculated in Section V. Numerical solutions are given in Section VI, followed by the 
discussion and conclusion in Section VII and VIII, respectively. 
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11. Problem Formulation and Mathematical Preliminaries 

Throughout the paper, x* and Xj correspond to the i-th row and the j-th column of matrix X, respectively. 
When S is an index set, X^, As corresponds to a submatrix collecting corresponding rows of X and columns 
of A, respectively. The following noiseless version of the canonical MMV formulation is very useful for our 
analysis. 

Definition 2.1 (Canonical form noiseless MMV): Let m, n and r be positive integers (r < m < n) 
that represent the number of sensor elements, the ambient space dimension, and the number of snapshots, 
respectively. Suppose that we are given a sensing matrix A E R'"^*" and an observation matrix B E r'"^'' 
such that B = AX^, for some X^, E M."^^ and ||^*|| = |suppX| — k. A canonical form noiseless multiple 
measurement vector (MMV) problem is given the estimation problem of /c-sparse vectors X G R"^'' through 
multiple snapshots B = AX using the following formulation: 

minimize ||^||o (II-5) 

subject to B = AX, 

where ||X||o = |suppX|, suppX = {1 < i < n : x' ^ 0}, x' is the i-th row of X, and the observation 
matrix B is full rank, i.e. rank(B) = r < k. 

Compared to (1.2), the canonical form MMV has the additional constraint that rank(S) = r < H-'^Ho- 
This is not problematic though since every MMV problem can be converted into a canonical form using the 
following dimension reduction. 

• Suppose we are given the following linear sensor observations: B = AX where A e R™^" and 
X e M"^' satisfies ||X||o = k. 

• Compute the SVD as B = UDj-V*, where is an r x r diagonal matrix, y e C'^'' consists of right 
singular vectors, and r = rank(B), respectively. 

• Reduce the dimension as Bsv = BV and Xsv = XV. 

• The resulting canonical form MMV becomes Bgv = ^Xgv- 

We can easily show that rarLk(B5y) = r < k and the sparsity k := ||X||o = ||Xs:y||o with probabiUty 1. 
Therefore, without loss of generality, the canonical form of the MMV in Definition 2.1 is assumed throughout 
the paper. 

The following definitions are used throughout this paper. 

Definition 2.2: [18] The rows (or colunms) in M" are in general position if any n collection of rows (or 
columns) are Unearly independent. 

If A G M™^", where m <n, the columns of A are in general position if and only if spark(A) = m + 1. 
Also, it is equivalent to K-Tank{A) = m where ii"— rank denotes the Kruscal rank, where a Kruscal rank 
of A is the maximal number q such that every collection of q columns of A is Unearly independent [23]. 

Definition 2.3 (Mutual coherence): For a sensing matrix A= [ai, • • • , a„] e M™^", the mutual coherence 
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IJ,{A) is given by 



H = max - — fj-r — 77, 
i<j<k<n ||aj||||afe|| 



a*afe 



where the superscript * denotes the Hermitian transpose. 

Definition 2.4 (Restricted Isometry Property (RIP)): A sensing matrix A e W^^^ is said to have a k- 
restricted isometry property (RIP) if there exist left and right RIP constants < 5^, (5^ < 1 such that 



for all X e K" such that ||x||o < fc. A single RIP constant 5k = max{5j^,5j^} is often referred to as the 
RIP constant. 

Note that the condition for the left RIP constant < < 1 is sufficient for the uniqueness of any 
fc-sparse vector x satisfying Ax. = b for any fc-sparse vector x, but the condition 52k < 1 is often too 
restrictive. 



In this section, we review the conventional algorithms for the MMV problem and analyze their limitations. 
This survey is useful in order to understand the necessity of developing a new class of algorithms. Except 
for the MUSIC and cumulant MUSIC algorithm, all other algorithms have been developed in the context of 
compressive sensing. We will show that all the existing methods have their own disadvantages. In particular, 
the maximum sparsity levels that can be resolved by these algorithms are limited in achieving the maximum 
gain from joint sparse recovery. 

A. Simultaneous Orthogonal Matching Pursuit (S-0MP)[21, 30] 

The S-OMP algorithm is a greedy algorithm that performs the following procedure: 

• at the first iteration, set Bq = B and = 0, 

• after J iterations, Sj = and Bj = (7 — Psj)B, where Psj is the orthogonal projection onto 

span{a;J/^i, 

• select Ij+i such that llaf BAU = max llaf BjlU and set Sj+i = SjU {Ij+A. 

-r M ( j+i M 1<1<N " ' " ^ L -r J 

Worst case analysis of S-OMP [42] shows that a sufficient condition for S-OMP to succeed is 



(l-^,^)||xf <||^xf <(l + ^«)||xf 



III. Conventional MMV Algorithms 



max \\Alaj\\i < 1 
jesuppA 



(lll.l) 



where S = suppX. An expUcit form of recoverable sparsity level is then given by 




(III.2) 
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Note that these conditions are exactly the same as Tropp's exact recovery conditions for the SMV problem 
[43], implying that the sufficient condition for the maximum sparsity level is not improved with an increasing 
number of snapshots even in the noiseless case. In order to resolve this issue, the authors in [42] and [38] 
performed an average case analysis for S-OMP, and showed that S-OMP can recover the input signals for 
the MMV problem with higher probability when the number of snapshots increases. However, the simulation 
results in [42] and [38] suggest that S-OMP performance is saturated after some number of snapshots, even 
with noiseless measurements, and S-OMP never achieves the bound with a finite number of snapshots. 

B. 2-Thresholding [42] 

In 2- thresholding, we select a set S with \S\ = k such that 

||a*-B||2 > ||a*B||2, for all I e S, j ^ S. 

If we estimate the suppX by the above criterion, we can recover the nonzero component of X by the 
equation = AI^Y. In [42], the authors demonstrated that the performance of 2-thresholding is often not 
as good as that of S-OMP, which suggests that 2-thresholding never achieves the /o-bound (1.3) with finite 
snapshots even if the measurements are noiseless. 

C. ReMBO algorithm [23] 

Reduce MMV and Boost (ReMBo) by Mishali and Eldar [23] addresses the MMV problem by reducing 
it to a series of SMV problems based on the following. 

Theorem 3.1: [23] Suppose that X satisfies ||X||o = k and AX = B with k < spark(A)/2. Let v eW 
be a random vector with an absolutely continuous distribution and define b = Av and x = Xv. Then, for 
a random SMV system Ax = b and b = Bv, we have 

(a) For every v, the vector x is the unique fc-sparse solution. 

(b) Prob(supp(x) = supp(x)) = 1. 

Employing the above theorem, Mishali and Eldar [23] proposed the ReMBo algorithm which performs 
the following procedure: 

• set the maximum number of iterations as Maxlters, set i = 1 and Flag = F, 

• while i < Maxlters and Flag = F, generate a random SMV problem as in Theorem 3.1, 

- if the SMV problem has a fc-sparse solution, then we let S be the support of the solution vector, 
and let Flag = T 

- otherwise, increase i by 1 

• if Flag = T, find the nonzero components of X by the equation X^ = AgB. 

In order to achieve the lo bound (1.4) by ReMBO without any combinatorial SMV solver, an uncountable 
number of random vectors v are required. With a finite number of choices of v, the performance of ReMBo 
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is therefore dependent on randomly chosen input and the solvability of a randomly generated SMV problem 
so that it is difficult to achieve the theoretical l°-bound even with noiseless measurements. 

D. Mixed norm approach [32] 

The mixed norm approach is an extension of the convex relaxation method in SMV [10] to the MMV 
problem. Rather than solving the original MMV problem (II. 5), the mixed norm approaches solve the 
following convex optimization problem: 

minimize l<p,q<2 (in.3) 

subject to B = AX, 

where ||Xj|p ^ = II^NIp)'- The optimization problem can be formulated as an SOCP (second order 

cone program) [31], homotopy continuation [44], and so on. Worst case bounds for the mixed norm approach 
were derived in [21], which shows no improvement with the increasing number of measurement. Instead, 
Eldar et al [38] considered the average case analysis when p = 2 and q = I and showed that if 

max ||^^aj||2 < a < 1, 

where S = snppX, then the probability success recovery of joint sparsity increases with the number of 
snapshots. However, it is not clear whether this convex relaxtion can achieve the Iq bound. 

E. Block sparsity approaches [34] 

Block sparse signals have been extensively studied by Eldar et al using the uncertainty relation for the 
block-sparse signal and block coherence concept. Eldar et al. [34] showed that the block sparse signal can 
be efficiently recovered using a fewer number of measurements by exploiting the block sparsity pattern as 
described in the following theorem: 

Theorem 3.2: [34] Let positive integers L,n,N and D = [D[l],-- - ,D[n]] e M^^^ be given, where 
L < N, N = nr for some positive integer r and for each 1 < j < n, D[j] G R^^^. Let hb be the 
block-coherence which is defined by 

MB = , m£« -p(D[i]*D[fc]) 

l<j<fc<n r 

where p denotes the spectral radius, be the sub-coherence of the sensing matrix A which is defined by 

v = maxmax |d*dj|, di,dj G 

and r be the block size. Then, the block OMP and block mixed 1-2 jli optimization program successfully 
recover the fc-block sparse signal if 

^r<\ f/xs' + r - (r - 1)— y (III.4) 
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Note that we can transform B = AX into an SMV system vcc{B'^) = {A(^ /r)vcc(X^), where vcc(X^) 
is block-fc sparse with length r and denotes the Kronecker product of matrices. Therefore, one may think 
that we can use the block OMP or block h/h optimization problem to solve the MMV problem. However, 
the following theorem shows that this is pessimistic. 

Theorem 3.3: For the canonical MMV problem in Definition 2.1, a sufficient condition for recovery using 
block-sparsity is 



Proof: Since A<g)Ir = [oi,j/r]"j=i, if we let ^(g)/^ = [D[l], • • • ,D[n]], we have u = due to the 



by the definition of mutual coherence. Applying (III A) with u = and iib — we obtain (111.5). ■ 
Note that (III.5) is the same as that of OMP for SMV. The main reason for the failure of the block sparse 
approach for the MMV problem is that the block sparsity model does not exploit the diversity of unknown 
matrix X. For example, the block sparse model cannot differentiate a rank-one input matrix X and full-rank 
matrix X. 

F. M-SBL [33] 

M-SBL (Sparse Bayesian Learning) by Wipf and Rao [45] is a Bayesian compressive sensing algorithm 
to address the lo minimization problem. M-SBL is based on the ARD (automatic relevance determination) 
and utilizes an empirical Bayesian prior thereby enforcing a joint sparsity. Specifically, the M-SBL performs 
the following procedure: 

(a) initialize 7 and T := diag(7) e M"^". 

(b) compute the posterior variance S and mean X as follows: 



(III.5) 



where fx denotes the mutual coherence of the sensing matrix A e 



diagonality, and 




s := r -rA*{ArA* + XI) 



^AT 



X := rA*{ATA* +XI)-'^B, 



(c) 



where A > denotes a regularization parameter, 
update 7 by 



{new) 



1 



1 < j < n 



^ 1-77%/ 



(d) 



repeat (b) and (c) until 7 converges to some fixed point 7*. 
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Wipf and Rao [45] showed that increasing the number of snapshots in SBL reduces the number of local 
minimizers so that the possibility of recovering input signals increases from joint sparsity. Furthermore, 
in the noiseless setting, if we have k linearly independent measurements and the nonzero rows of X are 
orthogonal, there is a unique fixed point 7* so that we can correctly recover the fc-sparse input vectors. 
To the best of our knowledge, M-SBL is the only compressive sensing algorithm that achieves the same 
Zo-bound as MUSIC when r = fc. However, the orthogonality condition for the input vector X that achieves 
the maximal sparsity level is more restricted than that of MUSIC. Furthermore, no explicit expression for 
the maximum sparsity level was provided for the range i&uk{B) < k. 

G. The MUSIC Algorithm [36,40] 

The MUSIC algorithm was originally developed to estimate the continuous parameters such as bearing 
angle or DOA. However, the MUSIC criterion can be still modified to identify the support set from the finite 
index set as follows. 

Theorem 3.4: [36,40](MUSIC Criterion) Assume that we have r linearly independent measurements B e 
j^mxr g^^j^ jj^^j ^ _ ^j^^ fQj. x.^, eM."^'^ and r = ||X*||o fc < m. Also, we assume that the columns of 
a sensing matrix A € R™^" are in general position; that is, any collection of m columns of A are linearly 
independent. Then, for any j S {1, • • • i n}, j G suppX* if and only if 

Q*aj = 0, (in.6) 

or equivalently 

a*Pfl(Q)a, = (III.7) 

where Q G M™x consists of orthonormal columns such that Q*B = so that R{Q)-^ = R{B), which 
is often called "noise subspace". Here, for matrix A, R{A) denotes the range space of A. 

Proof: By the assumption, the matrix of multiple measurements B can be factored as a product B = 
AsX^ where As G IR'"x'= and X^ e R'^^'^, where S = suppX*, As is the matrix which consists of 
columns whose indices are in S and X^ is the matrix that consists of rows whose indices are in S. Since 
As has fuU column rank and X^ has full row rank, R{B) = R{As). Then we can obtain a singular value 
decomposition as 

B=[U Q]diag[ai,--- ,afc, 0, • • • , 0]F*, 

where R{U) = R{As) = R{Q)^. Then, Q*aj = if and only if e R{Q)-^ = R{As) so that a^ can be 
expressed as a linear combination of {afcjfces. Since the columns of A are in general position, Q*aj = 
if and only if j £ suppX*. ■ 
Note that the MUSIC criterion (III.6) holds for all m > /c + 1 if the columns of A are in general position. 
Using the compressive sensing terminology, this implies that the recoverable sparsity level by MUSIC (with 
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a probability 1 for the noiseless measurement case) is given by 

||X||o < m = spark(A) - 1, (III.8) 

where the last equality comes from the definition of the spark. Therefore, the Iq bound (1.3) can be achieved 
by MUSIC when r = k. However, for any r < k, the MUSIC condition (III. 6) does not hold. This is a major 
drawback of MUSIC compared to the compressive sensing algorithms that allows perfect reconstruction with 
extremely large probabiUty by increasing the number of sensor elements, m. 

H. Cumulant MUSIC 

The fourth-order cumulant or higher order MUSIC was proposed by Porat and Friedlander |46| and 
Cardoso [47] to improve the number of resolvable resolvable sources over the conventional second-order 
MUSIC. Specifically, the cumulant MUSIC derives a MUSIC-type subspace criterion from the cumulant 
of the observation matrix. It has been shown that the cumulant MUSIC can resolve more sources than 
conventional MUSIC for specific array geometries [48 1. However, a significant increase in the variance of 
the target estimate of a weak source in the presence of stronger sources has been reported, which was not 
observed for second order MUSIC [49]. This increase often prohibit the use of fourth-order methods, even 
for large SNR, when the dynamic range of the sources is important [49]. Furthermore, for general array 
geometries, the performance of the cumulant MUSIC is not clear. Therefore, we need to develop a new type 
of algorithm that can overcome these drawbacks. 

/. Main Contributions of Compressive MUSIC 

Note that the existing MMV compressive sensing approaches are based on a probabilistic guarantee, 
whereas array signal processing provides a deterministic guarantee. Rather than taking such extreme view 
points to address a MMV problem, the main contribution of CS-MUSIC is to show that we should take 
the best of both approaches. More specifically, we show that as long as A: — rank(i3) partial support can 
be estimated with any compressive sensing algorithms, the remaining unknown support of rank(i3) can be 
estimated deterministically using a novel generalized MUSIC criterion. By allowing such hybridization, our 
CS-MUSIC can overcome the drawbacks of the all existing approaches and achieves the superior recovery 
performance that had not been achievable by any of the aforementioned MMV algorithms. Hence, the 
following sections discuss what conditions are required for the generalized MUSIC and partial support 
support recovery to succeed, and how CS-MUSIC outperforms existing methods. 

IV. Generalized MUSIC criterion for compressive MUSIC 

This section derives an important component of compressive MUSIC, which we call the generalized 
MUSIC criterion. This extends the MUSIC criterion (III.6) for r < k. Recall that when we obtain k linearly 
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independent measurement vectors, we can determine the support of multiple signals with the condition that 
Q*aj = if and only if j e suppX. In general, if we have r linearly independent measurement vectors, 
where r < k, we have the following. 

Theorem 4.1 (Generalized MUSIC criterion): Let m, n and r be positive integers such that r < m < n. 
Suppose that we are given a sensing matrix A E IR™^" and an observation matrix B E M™^''. Assume that 
the MMV problem is in canonical form, that is, rank(i?) = r < k. Then, the following holds: 

(a) spark((5*A) < fc - r + 1. 

(b) If the k nonzero rows are in general position (i.e., any collection of r nonzero rows are linearly 
independent) and A satisfies the RIP condition with < (52;._^_,_^(A) < 1, then 

spark((5*A) = k-r + l. 

Proof: See Appendix A. ■ 
Note that, unlike the classical MUSIC criterion, a condition for the left RIP constant < < 1 

is required in Theorem 4.1 (b). This condition has the following very interesting implication. 

Lemma 4.2: For the canonical form MMV, A G R^^" satisfies RIP with < d^k-r+i < 1 if and only if 

^ ^ spark(A)+rank(B)- 1 ^^^^^ 

Proof: Since A e ]R™><" has the left RIP condition < (^gfe^^+i < 1' any collection of 2A; - r + 1 
columns of A are linearly independent so that spark(A) > 2A; — r + 1. Hence, 

spark(A) + r - 1 _ spark(A) + rank(B) - 1 
2 ~ 2 

since r = rank(B). For the converse, assume the condition (IV.l). Then we have 2A; — r + 1 < spark(A) 
which implies < 52^_^j^i < 1. ■ 

Hence, if A satisfies RIP with < < 1 and if we have fc-sparse coefficient matrix X that satisfies 

AX = B, then X is the unique solution of the MMV. In other words, under the above RIP assumption, 
for noiseless case we can achieve the /o-uniqueness bound, which is the same as the theoretical limit (1.3). 
Note that when k = r, we have spark(Q*A) = 1, which is equivalent to there being some j's such that 
Q*aj = 0, which is equivalent to the classical MUSIC criterion. By the above lemma, we can obtain a 
generalized MUSIC criterion for the case r < fc in the following theorem. 

Theorem 4.3: Assume that A e M*"^", X € R"'''', and B G M™^'^ satisfy AX = B and the conditions 
in Theorem 4.1 (b). If h-r C suppX with \Ik-r\ = fc - r and e R^xC'-^), which consists of 

columns, whose indices are in Ik-r- Then for any i € {1, • • • , n} \ Ik-r, 

rank(Q* , a,]) = A; - r (IV2) 



if and only if j e suppX. 
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Proof: See Appendix B. ■ 
When r = k, Aii__^ = and (IV.2) is the same as the classic MUSIC criterion (III. 6) since rank((5*aj) = 
<;=^ Q*a.j ~ 0. However, the generalized MUSIC criterion (IV.2) for r < fc is based on the rank of the 
matrix, which is prone to error under an incorrect estimate of noise subspace Q when the measurements 
are corrupted by additive noise. Hence, rather than using (IV.2), the following equivalent criterion is more 
practical. 

Corollary 4.4: Assume that A e K™^", X e R"^% B e M™^'^, C suppX, and Aj^_^ are the 

same as in Theorem 4.3. Then, 



(IV.3) 



if and only if j G suppX. 

Proof: See Appendix B. ■ 
Note that -Pr(q) — QQ* in MUSIC criterion (III. 7) is now replaced by Pr(q) — PR[Pfi^Q^Ai^, ] where 
Ik-r C suppX. The following theorem shows that Pr{q) — Pr[Pihq)Ai^ ] has very important geometrical 
meaning. 

Theorem 4.5: Assume that we are given a noiseless MMV problem which is in canonical form. Also, 
suppose that A and X satisfy the conditions as in Theorem 4.1 (b). Let U G M''"^'"and Q e k™x(™-''-) 
consist of orthonormal columns such that R{U) — R{B) and R{Q)^ = R(B). Then the following properties 
hold : 

(a) UU* + Pr{qq'Ai^ ) is equal to the orthogonal projection onto R{B) + R{QQ* Ai^_^). 

(b) QQ* — PRiQQ'Ai^ ) is equal to the orthogonal projection onto R{Q) n R{QQ* Aij^_^)^ . 

(c) QQ* — Pr{qq'Ai^ ) is equal to the orthogonal complement of R{[U Aj^_^]) or R{[B Aj^_^]). 
Proof: See Appendix C. ■ 



(noise subspace for 
generalized MUSIC) 




(noise subspace 

for MUSIC) 



R{QQ'A,,_ 



(signal subspace) 



Fig. 1 . Geometric view for the generalized MUSIC criterion : the dashed line corresponds to the conventional MUSIC criterion, where 
the squared norm of the projection of aj(j G suppX) onto the noise subspace R{Q) may not be zero. aj(j g suppX) is orthogonal 
to the subspace /?(P^(q) — Pr^qq* Aj )) th^t we can identify the indices of the support of X with the generalized MUSIC 
criterion. 
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Figure 1 illustrates the geometry of corresponding subspaces. Unlike the MUSIC, the orthogonality of the 
a.j, j E suppX need to be checked with respect to R{Q) D R{QQ* Ai^_^)^ . Based on the geometry, we 
can obtain following algorithms for support detection. 

(Algorithm 1: Original form) 

1) Find k — r indices of suppX by any MMV compressive sensing algorithms such as 2- thresholding or 
SOMP. 

2) Let Ik-r be the set of indices which are taken in Step 1 and S = Ik-r- 

3) For i e {1, • • • , n}\Ik-r, calculate the quantities -qd) = a* [Pr(q) -Pp„(^j,A,^_ Ja^ for all j ^ h-r- 

4) Make an ascending ordering of ri{j), j ^ Ik-r-: choose indices that correspond to the first r elements, 
and put these indices into S. 

(Algorithm 2: Signal subspace form) 

Alternatively, we can also use the signal subspace form to identify the support of X: 

1) Find k — r indices of suppX by any MMV compressive sensing algorithms such as 2-thresholding or 
SOMP. 

2) Let Ik-r be the set of indices which are taken in Step 1 and S = Ik-r- 

3) For j e {!,••• ,n} \ Ik-r, calculate the quantities r/(j) = a^[PR([/) + Pr[p^^^^Aij^ for ^1 

j i h-r- 

4) Make a descending ordering of 7?(j), j ^ /fe-r. choose indices that correspond to the first r elements, 
and put these indices into S. 

In compressive MUSIC, we determine k — r indices of suppX with CS-based algorithms such as 2- 
thresholding or S-OMP, where the exact reconstruction is a probabilistic matter. After that process, we 
recover remaining r indices of suppX with a generalized MUSIC criterion, which is given in Theorem 4.3 
or Corollary 4.4, and this reconstruction process is deterministic. This hybridization makes the compressive 
MUSIC applicable for all ranges of r, outperforming all the existing methods. 

So far, we have discussed about the recovery of the support of the multiple input vectors assuming that 
we know about the size of the support. One of the disadvantages of the existing MUSIC-type algorithms is 
that if the sparsity level is overestimated, spurious peaks are often observed. However, in CS-MUSIC when 
we do not know about the correct size of the support, we can still apply the following lemma to estimate 
the size of the support. 

Lemma 4.6: Assume that A € X.^ € M"'''' and B £ M™^'' satisfy AX^ = B and the conditions in 

theorem 4.1 (b), and k denotes the true sparsity level, i.e. k = ||X*||o. Also, assume that r <k <k + r and 
we are given If._^ C suppX with |/^_^| —k — r, where is the partial support of size k — r estimated 
by any MMV compressive sensing algorithm. Also, we let r]{j) := a*[P[{(^Q) — Pr(p^^i^^a,^ Then, 
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k = k = \\X^\\o if and only if 



C(k) := min V Tj(j) = 0. (IV.4) 



Proof: Necessity is trivial by Corollary 4.4 so we only need to show sufficiency of (IV.4) assuming the 
contrary. We divide the proof into two parts. 

(i) r < fc < fc : By the Lemma 4.1, for any j e {1, • • • ,n}\ 

rank[Q|._JA^_^,aj]] =fc-r + l. 

As in the proof of CoroUary 4.4, this imphes ri{j) > for any j e {1, • • • ,n} \ so that we have 
C{k) > for fc < fc. 

(ii) k < k < k + r : Here, we have already chosen at least k — r + 1 indices of the support of X. By 
Corollary 4.4, (IV.3) holds only for, at most, r — 1 elements of {!,••• ,n}\ since c suppX. 
Hence, C{k) > for k > k. ■ 
The minimization in (IV.4) is over all index sets J of size r that include elements from {1, • • • ,n} and 
no elements form For fixed k and this minimization can be performed by first computing the 
summands for all j e {1, • • • , n} \ 7^, and then selecting the r of smallest magnitude. Lemma 4.6 also 
tells us that if we calculate C{k) by increasing k from r, then the first k such that C{k) = corresponds 
to the unknown sparsity level. For noisy measurements, we can choose the first local minimizer of C{k) by 
increasing k. 



V. Sufficient Conditions for Sparse Recovery using Compressive MUSIC 



A. Large system MMV model 

Note that the recovery performance of compressive MUSIC rehes entirely on the correct identification of 
k — r partial support in suppX via compressive sensing approaches and the remaining r indices using the 
generahzed MUSIC criterion. In practice, the measurements are noisy, so the theory we derived for noiseless 
measurement should be modified. In this section, we derive sufficient conditions for the minimum number 
of sensor elements (the number of rows in each measurement vector) that guarantee the correct support 
recovery by compressive MUSIC. Note that for the success of compressive MUSIC, both CS step and the 
generahzed MUSIC step should succeed. Hence, this section derives separate conditions for each step, which 
is required for the success of compressive MUSIC. 

For SMV compressive sensing, Fletcher, Rangan and Goyal [16] derived an explicit expression for the 
minimum number of sensor elements for the 2-thresholding algorithm to find the correct support set. Also, 
Fletcher and Rangan [17] derived a sufficient condition for S-OMP to recover X. Even though their derivation 
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is based on a large system model with a Gaussian sensing matrix, it has provided very useful insight into 
the SMV compressive sensing problem. Therefore, we employed a large system model to derive a sufficient 
condition for compressive MUSIC. 

Definition 5.1: A large system noisy canonical MMV model, LSMMV(m, n, fc, r; e), is defined as an 
estimation problem of fc-sparse vectors X G R"^' that shares a common sparsity pattern through multiple 
noisy snapshots Y = AX + N using the following formulation: 

minimize ||^||o (V.l) 
subject to Y = AX + N, 

where A e M™^" is arandommatrix withi.i.d. A/'(0, 1/m) entries, A/' = [ni,--- ,11^] e M™^'' is an additive 
noise matrix whose components have i.i.d. ^(0, 1/m) entries, and m = m(n) oo,k = k{n) 00 and 
r = r(n) 00 such that fc/m < 1 — e, r/fc < 1 — e for some e > and rank(^X) = r <k = \X\q. 
Here, we assume that p := lim„^oo m{n)/n > and a = lim„^oo r{n)/k{n) > exist. 

Note that the conditions k/m < 1 — e, and r/k < 1 — e are technical conditions that prevent to, k, and r 
from reaching equivalent values when n ^ co. 

B. Sufficient condition for generalized MUSIC 

For the case of a noisy measurement, Y is corrupted and the corresponding noise subspace estimate Q is 
not correct. However, the following theorem shows that if the Ik-r C suppX, then the generalized MUSIC 
estimate is consistent and achieves the correct estimation of the remaining r-indices for sufficiently large 
SNR. 

Theorem 5.1: For a LSMMV(to, n, fc, r; e), if we have Ik-r C suppX, then we can find remaining r 
indices of suppX with the generalized MUSIC criterion if 

,(l + 5)(2fc-r + l)| (V.2) 

for some 5 > provided that SNRmin(^) := C7min(-B)/||A/'|| > 1 + 4(k(B) + 1), where k{B) denotes the 
condition number and a^amiB) denotes the smallest singular value of B. 

Proof: See Appendix D. ■ 
Note that for SNRmin(^) — 00, the condition becomes m > {l + 5)(2k — r + l) for some ^ > 0. However, 
as SNRini„(y) decreases, the first term dominates and we need more sensor elements. 

C. Sufficient condition for partial support recovery using 2-thresholding 
Now, define the thresholding estimate as It = where 

p{j) = ||a*y|||. 



to > max < k{l + 5) 



4(^(j?) + 1) 
SNRmi„(r) - 1 
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Now, we derive sufficient conditions for the success of 2-thresholding in detecting k — r support when r is 
a small fixed number or when r is proportionally increasing with respect to k. 

Theorem 5.2: For a LSMMV(m, n, /c, r; e), suppose MSR^^^'^'' are deterministic sequences and 

rMSR'"/((72. (B)) 

min / V^minv // 



2 



m > 2(1 + 5) ^ ^ (V.4) 

' ^MSR(::r;)-V(2(2||B|| + ||7V||)||Ar||)/r' 



B{n,k,r) = < 



(V.5) 



where 

' log {n-k) + {\\B\\% - ra^-M) log ((n - fc)r), 

if r is a fixed positive integer 

^^i„(B)i + (11^111 - rai^^iB)) log ((n - k)r), 

if a := lim r/k > 

where 

and is the (k — r)-th value if we are ordering the values of ||x*|p for 1 < i < n with descending 

order. Then, 2-thresholding asymptotically finds a k — r sparsity pattern. 

Proof: See Appendix F. ■ 
• For noiseless single measurement vector (SMV) case, i.e. r = 1, if SNRinin(^) — oo, this becomes 

, 2 



(||x||Vlog(fc-l) + ||b||Vlog(n-fc) 

> 2(1 + S)^ inrrr^ 



Using Lemma E.2 in Appendix E, we have 



llblP 

lim = 1. (V.6) 



Hence, we have 

m>2(l + ^) (Vlog(fc-l) + Vlog(n-fc))^ 

mm \Xi\-' V / 



for some 5 > 0, as the sufficient condition for 2-thresholding in SMV cases. Compared to the result in 
[16] as 



m > 2(1 + 6) ." ", 1, (0^ + Vlog(n-fc))2, 

mm \Xj p 
jesuppX 
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our bound has a slight gain due to -^/log (fc — 1) and min^g/j \xj\'^, where |/f | = k — 1. This is because 
even for the SMV problem, the one remaining index can be estimated using the generalized MUSIC 



criterion 

|2 



If \\B\\p = rcr4in(-S), r is a fixed number and SNRniin(i^) oo, then our bound can be reduced as 

^ ^log (fc - r) + V'-^.nC^) ^ipg _ fe)^ ' 



m > 2(1 + ^) 



MSR^: 



when the measurement is noiseless. Using Lemma E.2 in Appendix E, this can be simplified as 



Therefore, the MMV gain over SMV mainly comes from ^/ (log (n — k))/r. 

If ||-B|||, = rcr^ijj(B) and lim„^oo r/k = a > 0, then under the condition (V.6) we have 

2 / . X 2 



l(fe-r) 

Therefore, the log (n — k) factor disappears, which provides more MMV gain compared to (V.7). 

D. Sufficient condition for partial support recovery using subspace S-OMP 

Next, we consider the minimum number of measurements for compressive MUSIC with S-OMP. In 
analyzing S-OMP, rather than analyzing the distribution of ||a*P^^^^ )B\\p where It denotes the set of 
indices which are chosen in the first t step of S-OMP, we consider the following version of subspace 
S-OMP due to its superior performance [37,50]. 

1) Initialize i = and 7o = 0- 

2) Compute P^(^a, ) which is the projection operator onto the orthogonal complement of the span of 

{a, : J e It}. 

3) Compute P^.^ .B and for all j = 1, • • • , n, compute p{t,j) = ||a*PR(pi 

4) Take jt — argniaxj^i^... ^„ p{t,j) and It+i = It^ {jt}. If f < fc return to Step 2. 

5) The final estimate of the sparsity pattern is Ik. 

Now, we also consider two cases according to the number of multiple measurement vectors. First, we 
consider the case when the number of multiple measurement vectors is a finite fixed number Conventional 
compressive sensing (the SMV problem) is this kind of case. Second, we consider the case when r is 
proportional to n. This case includes the conventional MUSIC case. 

Theorem 5.3: For LSMMV(m, n, fc, r; e), let SNRmi„(F) = an,in{AX)/\\N\\ and suppose the following 
conditions hold: 
(a) r is a fixed finite number. 
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(b) Let SNRmi„(r) satisfy 

SNR„,i„(r) > 1 + —{k{B) + 1). (V.8) 
r 



If we have 

21og(n- fc) 



m> k{l + S) 



^ 4k (k(S) + 1) 



(V.9) 



r SNR„,i„(y)-l_ 

then we can find fc — r correct indices of suppX by applying subspace S-OMP. 

Proof: See Appendix G. ■ 

• As a simple corollary of Theorem 5.3, when SNRi„in(l^) oo, we can easily show that the number of 
sensor elements required for the conventional OMP to find the all A;-support indices in SMV problem 
is given by 

m>2(l + (5)fclog(n-A:) , (V.IO) 

for a small ^ > 0. This is equivalent to the result in [16]. 

• When SNRinin(^) — oo, then the number of sensor elements for subspace S-OMP is 

k 

m > 2(1 + (5)- log (n-fc) 

for some 5 > 0. Hence, the sampling ratio is the reciprocal of the number of multiple measurement 
vectors. 

• Since fc ^ oo in our large system model, (V.8) tells us that the required SNRi„in(F) should increase 
to infinity. 

Next, we consider the case that r is proportionally increasing with respect k. In this case, we have the 
following theorem. 

Theorem 5.4: For LSMMV(m, n, fc, r; e), let SNRmin(>^) = amin(^-'^)/||iV|| and suppose the following 
conditions hold. 

(a) r is proportionally increasing with respect to k so that a := lim„_>.oo r{n)/k{n) > exist. 

(b) Let SNRmin(l^) satisfy 

SNR^i„(y)>l + l(«(S) + l). (V.ll) 
a 



Then if we have 



m>k{l + df — ^[2-F{a)f, (V.12) 



2 

a SNR„i„(y)-l 



for some S > where 



F(a) = - xdXiix), 
a Jo 



d\i(x) = (■\/(4 — x)x) I (27r.x) is the probability measure with support [0,4], < ti(pi) < 1 satisfies 
j|2ti(a) ^^^^^^-j _ ^ ^j^j dsi{x) = (l/7r)\/4"^-a?^ is a probability measure with support [0, 2]. Here, F{a) is 
an increasing function such that F{1) — 1 and limc_>o+ ^(") = 0. Then we can find k — r correct indices 
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of suppX by applying subspace S-OMP. 

Proof: See Appendix G. ■ 

• As a corollary of Theorem 5.4, when r{n)/k{n) 1 and SNRmin(y) oo, we can see that the 
number of sensor elements required for subspace S-OMP to find k — r support indices is given by 

m > {l + 5)k, 

for a small S > 0, which is the same as the number of sensor elements required for MUSIC. 

• We can expect that the number of sensor elements required for subspace S-OMP to find k — r support 
indices is at most 4(1 + 6)k in the noiseless case, where 5 > is an arbitrary small number. Hence, 
the log n factor is not necessary. 

• Unlike the case in Theorem 5.3, the SNR condition is now lower bounded by a finite number 1 + 
{4:/a){K{B) + 1). This implies that we don't need infinite SNR for support recovery, in contrast to 
SMV or Theorem 5.3. This is one of the important advantages of MMV over SMV. 

VI. Numerical Results 



In this section, we demonstrate the performance of compressive MUSIC. This new algorithm is compared 
to the conventional MMV algorithms, especially 2-SOMP, 2-thresholding and /2,i mixed-norm approach 
[31]. We do not compare the new algorithm with the classical MUSIC algorithm since it fails when r < 
k. We declared the algorithm as a success if the estimated support is the same as the true suppX, and 
the success rates were averaged for 5000 experiments. The simulation parameters were as follows: m € 
{1,2,..., 60}, n = 200, k E {1, 2, . . . , 30}, and r G {1, 3, 8, 16}, respectively. Elements of sensing matrix 
A were generated from a Gaussian distribution having zero mean and variance of l/w, and the suppX were 
chosen randomly. The maximum iteration was set to k for the S-OMP algorithm. 

According to (V.2), (V.9) and (V.12), for noiseless measurements, piece-wise continuous boundaries exist 
for the phase transition of CS-MUSIC with subspace S-OMP: 

fc -|- 1, r = k; 

2k-r + l, r<k; 
m > < (VI.l) 
(2fclog(n- fc))/r r«;fc. 

^ fc[2-F(a)]2, lim„^<x, r/fc > 0. 

Note that in our canonical MMV model, r = k includes many MMV problems in which the number of 
snapshots is larger than the sparsity level since our canonical MMV model reduces the effective snapshot 
r as r > k. Figure 2(a) shows a typical phase transition map of our compressive MUSIC with subspace 
S-OMP for noiseless measurements when n = 200 and r = 3 and ||x'|| is constant for all i = 1, • • • , n. 
Even though the simulation step is not in the large system regime, but r is quite small, so that we can expect 
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that {2k log {n — k))/r is a boundary for phase transition. Figure 2(b) corresponds to the case when r = 16 
and ||x*|| is constant for alH = 1, • • • ,n. Since in this setup r is comparable to k, we use the k[2 — F(a)]^ 
as a boundary. The results clearly indicates the tightness of our sufficient condition. 

Similarly, multiple piecewise continuous boundaries exist for the phase transition map for compressive 
MUSIC with 2-thresholding: 

k + 1, r = k; 

m> < 2k-r + l, r < k; (VI.2) 

_ 2 (\\X\\F^log{n - k)/V^ + ^B{n,k,r)/ry /MSR'^,:;, r < k. 

Since the phase transition boundary depends on the unknown joint sparse signal X through ||X||ir and 
MSR^7n ' we investigate this effect. Figure 3(a) and (b) show a typical phase transition map of our compres- 
sive MUSIC with 2-thresholding when r = 3 and r = 16, respectively, for noiseless measurements and ||x*|| 
are constant for all i; Figure 3(c) and (d) corresponds to the same case except ||x'||^ = (0.7)'. We overlayed 
theoretically calculated phase boundaries over the phase transition diagram. The empirical phase transition 
diagram clearly revealed the effect of the distribution X. Still, the theoretically calculated boundary clearly 
indicates the tightness of our sufficient condition. 

Fig. 4 shows the success rate of S-OMP, 2-thresholding, and compressive MUSIC with subspace S-OMP 
and 2-thresholding for 40dB noisy measurement when ||x'|| is constant for alH = 1, • • • ,n. When r = 1, 
the performance level of the compressive MUSIC algorithm is basically the same as that of a compressive 
sensing algorithm such as 2-thresholding and S-OMP. When r = 8, the recovery rate of the compressive 
MUSIC algorithm is higher than the case r = 1, and the compressive MUSIC algorithm outperforms the 
conventional compressive sensing algorithms. If we increase r to 16, the success of the compressive MUSIC 
algorithm becomes nearly deterministic and approaches the Iq bound, whereas conventional compressive 
sensing algorithms do not. 

In order to compare compressive MUSIC with other methods more clearly the recovery rates of various 
algorithms are plotted in Fig. 5(a) for S-OMP, compressive MUSIC with subspace S-OMP, and the mixed 
norm approach when p = 2,q = 1; and in Fig. 5(b) for 2-thresholding and compressive MUSIC with 
2-thresholding, when n = 200, m = 20 and r = 8,16, ||x*|| is constant, and SNR = 40dB. Note that 
compressive MUSIC outperforms the existing methods. 

To show the relationship between the recovery performance in the noisy setting and the condition number 
of matrices X, we performed the simulation on the recovery results for three different types of the source 
model X. More specifically, the singular values of X are set to be exponentially decaying with (i) r = 0.9, 
(ii) r = 0.7 and (iii) r = 0.5 respectively, i.e. the singular values of X are given by aj = t^~^ for 
j = I,- - ■ , rank(X). In this simulation, we are using noisy samples that are corrupted by additive Gaussian 
noise of SNR =40dB. Figure 6(a) shows the results when k — r entries of the support are known a priori 
by an "oracle" algorithm, whereas k — r entries of the support are determined by subspace S-OMP in Fig. 



22 



6(b) and by thresholding in Fig. 6(c). The results provide evidence of the significant impact of the condition 
number of X. 

Figure 7 illustrates the cost function to estimate the unknown sparsity level, which confirms that com- 
pressive MUSIC can accurately estimate the unknown sparsity level k as described in Lemma 4.6. In this 
simulation, n = 200, m = 40 and r = 5. The correct support size k is marked as circle. Note that C{k) 
has the smallest value at that point for the noiseless measurement cases, as shown Fig. 7(a), confirming our 
theory. For the 40dB noisy measurement case, we can still easily find the correct k since it corresponds to 
the first local minimizer as k increases, as shown in Fig. 7(b). 

VII. Discussion 

A. Comparison with subspace-augmented MUSIC [50] 

Recently, Lee and Bresler [50] independently developed a hybrid MMV algorithm called as subspace- 
augmented MUSIC (SA-MUSIC). The SA-MUSIC performs the following procedure. 

1) Find k — r indices of suppX by applying SOMP to the MMV problem U = AX where the set of 
columns of U is an orthonormal basis for R{B). 

2) Let Ik-r be the set of indices which are taken in Step 1 and S = Ik-r- 

3) For j e {1, • • • , n} \ Ik-r, compute 77(j) = ||(5*aj|p where Q S K™x('"-*=) consists of orthonormal 
columns such that Q*[U Aif,_^] = 0. 

4) Make an ascending ordering of ri{j), j ^ Ik-r, choose indices that correspond to the first r elements, 
and put these indices into S. 

By Theorem 4.5(c), we can see that the subspace-augmented MUSIC is equivalent to compressive MUSIC, 
sinces the MUSIC criterion in subspace-augmented measurement [U Ai^_^,] and the generalized MUSIC 
criterion in compressive MUSIC are equivalent. Therefore, we can expect that the performance of both 
algorithm should be similar except for the following differences. First, the subspace S-OMP in Lee and 
Bresler [50] is applying the subspace decomposition once for the data matrix Y whereas our analysis for 
the subspace S-OMP is based on subspace decomposition for the residual matrix at each step. Hence, our 
subspace S-OMP is more similar to that of [37]. However, based on our experiments the two versions of 
the subspace S-OMP provide similar performance when combined with the generalized MUSIC criterion. 
Second, the theoretical analysis of SA-MUSIC is based on the RIP condition whereas ours is based on large 
system limit model. One of the advantage of RIP based analysis is its generality for any type of sensing 
matrices. However, our large system analysis can provide explicit bounds for the number of required sensor 
elements and SNR requirement thanks to the Gaussian nature of sensing matrix. 

B. Comparison with results ofDavies and Eldar [37] 

Another recent development in joint sparse recovery approaches is the rank- awareness algorithm by Davies 
and Eldar [37]. The algorithm is derived in the noiseless measurement setup and is basically the same as our 
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subspace S-OMP in Section 5 except that a.j in step 3 is normalized after applying R 



to the original 



dictionary A. For the full rank measurement, i.e. r = k, the performance of the rank-aware subspace S-OMP 
is equivalent to that of MUSIC. However, for r < k, the lack of the generalized MUSIC criterion may make 
the algorithm inferior since our generalized MUSIC criterion can identify r support deterministically whereas 
the rank-aware subspace S-OMP should estimate the remaining r support with additional error-prone greedy 
steps. 

C. Compressive MUSIC with a mixed norm approach 

Another important issue in CS-MUSIC is how to combine the general MUSIC criterion with non-greedy 
joint sparse recovery algorithms such as a mixed norm approach [31]. Towards this, the fc — r greedy step 
required for the analysis for CS-MUSIC should be modified. One solution to mitigate this problem is to 
choose a fc — r support from the non-zero support of the solution and use it as a partial support for generalized 
MUSIC criterion. However, we still need a criterion to identify a correct k — r support from the solution, 
since the generalized MUSIC criterion only holds with a correct k — r support. Recently, we showed that 
a correct k — r partial support out of fc-sparse solution can be identified using a subspace fitting criterion 
[51]. Accordingly, the joint sparse recovery problem can be relaxed to a problem to find a solution that 
has at least fc — r + 1 correct support out of fc nonzero support estimate. This is a significant relaxation of 
CS-MUSIC in its present form that requires k — r successful consecutive greedy steps. Accordingly, the new 
formulation was shown to significantly improve the performance of CS-MUSIC for the joint-sparse recovery 
[51]. However, the new results are beyond scope of this paper and will be reported separately. 

D. Relation with distributed compressive sensing coding region 

Our theoretical results as well as numerical experiments indicate that the number of resolvable sources can 
increase thanks to the exploitation of the noise subspace. This observation leads us to investigate whether CS- 
MUSIC achieves the rate region in distributed compressed sensing [25], which is analogous to Slepian-Wolf 
coding regions in distributed source coding |52|. 

Recall that the necessary condition for a maximum likelihood for SMV sparse recovery is given by [16]: 



as SNR oo. Let mi denote the number of sensor elements at the «-th measurement vector If the total 
number of samples from r vectors are smaller than that of SMV-CS, i.e. Yl^^i rrii < k — 1, then we 
cannot expect a perfect recovery even from noiseless measurement vectors. Furthermore, the minimum 
sensor elements should be = fc to recover the values of the «-th coefficient vector, even when the fc 
indices of the support are correctly identified. Hence, the converse region at SNR — >^ oo is defined by the 
rrii < k,i = 1, - ■ ■ , r as shown Fig. 8(a)(b). 




SNR- MSR 



'mm 
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Now, for a fixed r our analysis shows that the achievable rate by the CS-MUSIC is rrii = 2fclog(n — k)/r 
(Fig. 8(a)). On the other hand, if lim„^oo r/k = a > 0, the achievable rate by the CS-MUSIC is rrii = 
(2 — F{a))'^k as shown in Fig. 8(b). Therefore, CS-MUSIC approaches the converse regin at r fc, whereas 
for the intermediate ranges of r there exists a performance gap from the converse region. However, even in 
this case if we consider a separate SMV decoding without considering correlation structure in MMV, the 
required sampling rate is m,i > 2k \og{n — k) which is significantly larger than that of CS-MUSIC. This 
analysis clearly reveals that CS-MUSIC is a quite efficient decoding method from distributed compressed 
sensing perspective. 

E. Discretization 

The MUSIC algorithm was originally developed for spectral estimation or direction-of-arrival (DOA) esti- 
mation problem, where the unknown target locations and bearing angle are continuously varying parameters. 
If we apply CS-MUSIC to this type of problems to achieve a finer resolution, the search region should be 
discretized more finely with a large n. The main problem of such discretization is that the mutual coherence 
of the dictionary A approaches to 1, which can violate the RIP condition of the CS-MUSIC. Therefore, the 
trade-off between the resolution and the RIP condition should be investigated; Duarte and Baraniuk recently 
investigated such trade-off in the context of spectral compressive sensing [53]. Since this problem is very 
important not only for the CS-MUSIC but for SMV compressed sensing problems that are originated from 
discretizing continuous problems, systematic study needs to be done in the future. 

VIII. Conclusions and future works 

In this paper, we developed a novel compressive MUSIC algorithm that outperforms the conventional 
MMV algorithms. The algorithm estimates k — r entries of the support using conventional MMV algorithms, 
while the remaining r support indices are estimated using a generalized MUSIC criterion, which was derived 
from the RIP properties of sensing matrix. Theoretical analysis as well as numerical simulation demonstrated 
that our compressive MUSIC algorithm achieved the Iq bound as r approaches the non-zero support size k. 
This is fundamentally different from existing information theoretic analysis [39], which requires the number 
of snapshots to approach infinity to achieve the lo bound. Furthermore, as r approaches 1, the recovery rate 
approaches that of the conventional SMV compressive sensing. We also provided a method that can estimate 
the unknown sparsity, even under noisy measurements. Theoretical analysis based on a large system MMV 
model showed that the required number of sensor elements for compressive MUSIC is much smaller than 
that of conventional MMV compressive sensing. Furthermore, we provided a closed form expression of the 
minimum SNR to guarantee the success of compressive MUSIC. 

The compressive sensing and array signal processing produce two extreme approaches for the MMV 
problem: one is based on a probabilistic guarantee, the other on a deterministic guarantee. One important 
contribution of this paper is to abandon such extreme viewpoints and propose an optimal method to take 
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the best of both worlds. Even though the resulting idea appears simple, we believe that this opens a new 
area of research. Since extensive research results are available from the array signal processing community, 
combining the already well-established results with compressive sensing may produce algorithms that may 
be superior to the compressive MUSIC algorithm in its present form. Another interesting observation is 
that the RIP condition < 1, which is essential for compressive MUSIC to achieve the Iq bound, is 

identical to the Iq recovery condition for the so-called modified CS [54]. In modified CS, r support indices 
are known a priori and the remaining k — r are estimated using SMV compressive sensing. The duality 
between compressive MUSIC and the modified CS does not appear incidental and should be investigated. 
Rather than estimating k — r indices first using MMV compressive sensing and estimating the remaining 
r using the generalized MUSIC criterion, there might be a new algorithm that estimates r supports indices 
first in a deterministic pattern, while the remaining fc — r are estimated using compressive sensing. This 
direction of research might reveal new insights about the geometry of the MMV problem. 

Appendix A: Proof of Theorem 4.1 

Proof: (a) First, we show that spark(Q* A) < A: - r -|- 1. Since Q*AX = 0, we have Q* Ax.i = for 
1 <i <r. Take a set P C suppX with \P\ = r — 1. Then, there exists a nonzero c = [ci, • • • , Cr] € ffi'" 
such that 

X^c = 0, where e M^'^-^)^'', (A.I) 

where X^ denotes a submatrix collecting rows corresponding to the index set P. 

Since the columns of X are linearly independent, Y^^^i CjXj ^ but Q*A{J2l^i CjXj) = 0. By (A.l), 

r 

\\J2^i^i\\o <k-r + l 

i=l 

SO that spa.Tk{Q*A) <k-r + l. 

(b) Suppose that there is x e M" \ {0} such that 

Q*Ax = 0, ||x||o < fc — r -h 1 and supp(x) ^ suppX. 

Since Q*Ax = 0, Ax e R{Q)'^ = R{B) so that there is a x such that Ax = Ax and supp(x) c suppX. 
Hence, we have 

A{x - x) = 0, ||x - x||o <2k-r + l. 

By the RIP condition < S2i^_^_^_^{A) < 1, x = x. It follows that whenever ||x||o < k — r + 1 and 
Q*Ax = 0, we have supp(x) c suppX. Since Ax e R{B) = R{AX), there is a y e R{X) such that 
Ax = Ay. Hence, if Q*Ax = and ||x||o < A; - r -|- 1, by the RIP condition of A, we have x e R{X). 
Finally, it suffices to show that for any x e R{X) \ {0}, 



||x||o > fc-r + 1. 
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Suppose that ||x||o < k — r. Then there is a set Z such that \Z\ = r and Z C suppX\supp(x). Then, there 
exists acGR'~\{0} such that 

X^c = 0, where e W''. 
This is impossible since the nonzero rows of X are in general position. 

■ 

Appendix B: Proof of Theorem 4.3 and Corollary 4.4 

Proof of Theorem 4.3: In order to show that (IV.2) impUes j G suppX, let Ik-r be an index set with 
\Ik-r \ = k — r and Ik-r C suppX. Then by Lemma 4.1 and the definition of the spark(A), 

Tank[Q* Ai^_^] = k-r. 

By the assumption, there is an Xfc_r e M'^"'" such that Q*aj = Q*Ai^_^Xk-r so that we have 

Q*[aj-Ai,_^Xk-r] = 0. 

Since a.j - Ai^_^Xk-r € N{Q*) = R{Q)-^ = R{B), there is a x e such that supp(x) c suppX and 
Bj — Aii,_^Xk-r = ASt. Hence we have y e M" such that Ay = and supp(y) C suppX U {j} U Ik-r 
so that ||y|| < 2fc - r + 1. By the RIP condition < 5^^._^_^_-^{A) < 1, it follows that {j} U supp(xfe_r) = 
supp(x) c suppX since j ^ Ik-r- Hence, under the condition (IV.2), we have j G suppX. 
In order to show that j G suppX impUes (IV.2), assume the contrary. Then we have 

r&nk{Q*[Ai^_^,a.j]) = k-r + l, 

where Ik-r C suppX with \Ik-r\ = k — r. Then for any x^.^ e M'^"'', Q*[B.j — Ai^_^Xk-r] ^ so that 
a-j - Ai^_^Xk-r i R{B). Set P = suppX \ {h-r U {j}) so that \P\=r- 1. Then there is a c e M'' \ {0} 
such that 

X^c = 0, where X^ e M^'^-^)^''. 

Then we have ||-X'c||o = k — r + 1 since the rows of X are in general position. Note that supp(Xc) = 
{j}Ulk-r- Since AXc e R{B), aj — Ai^_^Xk-r € R{B) for some Xk-r G R''~^, which is a contradiction. 

Proof of Corollary 4.4: Here we let Gi^_^ := Q* Ai^_^ and gj = Q*aj. Since we already have 
rank[G/^_^] = k — r, (IV.2) holds if and only if 

det [Gj,_,,g,]*[G,,_,,g,]=0. 
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Note that 



[G/,_.,gj] = 



where det [-A*j^_^Pii(^Q)Aj^_^] > because of rank[G/^_^] = k — r. Since 

^jPR{Q)Al^-r ^jPR{Q)^j 

det(AJ^_^Pij(Q)A7,_J det(a*PR(Q)a,- - a*PR^Q)Ai,_^{A*j^_^PR^Q^Ai,_^)-^A*j^ 



det 



(IV.2) is equivalent to 



= a*Pfl(Q)a,-a*Pfl(Q)A7,_(A*^_PR(Q)^/,_J M*^_PR(Q)a,- 



Pr{q) - PR{PmQ)A,^_ 



where Pr{q) = QQ*- Hence (IV.3) holds if and only if j e suppX. 

Appendix C: Proof of Theorem 4.5 and Lemma 4.6 



Proof of Theorem 4.5: (a) By the definitions of U and Q, we have t/*(3 = so that 

[UU* + PQQ'A,,_f = UU* + UU*QQ*Ai,_M*i,..QQ*Ai,_X^AX_^QQ* 

+QQ*Ai^_M*i^.^QQ*Ai,_X^A]^_^QQ*UU* 

+QQ*Ai,_M%..QQ*Ai,_X'A}^_^QQ*Ai,_M%..QQ*Ai,_X^A%_^QQ 
= UU*+Pqq,a,^_^. 

Since UU* +Pqq'Ai^_^ is a self-adjoint matrix, it is an orthogonal projection. Next, to show that R{UU* + 
Pqq* Ai^_^) = R{B) + R{QQ* Ai^_^), we only need to show the following properties : 

(i) [UU* + Pqq.^,^_ Jb = b for any b G R{B), 

(ii) [UU* + Pqq'A.^Jch = qi for any qi G i?(QQ*A/,_ J, 

(iii) [UU* + Pqq'Aj^Jci2 = for any qz G R{Q) n i?(QQM/,_ J^. 

For (i), it can be easily shown by using Q*h = and UU*h = h for any b G R{B). For (ii), any 
qi G R{QQ* Aji,_^), there is a w G M'''^'' such that qi = QQ* A/j._^w. Then by using the property 
U*Q = 0, we can see that property (ii) also holds. Finally, we can easily see that (iii) also holds by using 
{7*q= for any q G R{Q). 
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(b) This is a simple consequence of (a) since QQ* — Pqq*Ai^ = I — [UU* + Pqq'Ai^, ] and R{Q) D 
R{QQ*Ai^_^)-'- is an orthogonal complement of R{B) + R{QQ*Ai^_^). 

(c) Since spark((5* A) = fc — r + 1 by Lemma 4.1, [U Aij^_^] has k linearly independent columns. Hence 
we only need to find the orthogonal complement of R{[U A/^^^]) = R{U)+R{Ai^_^.) = R{B)+R{Ai^_^). 
Since R{U)^ = R{Q), we have R{U) + R{Ai^_^) = R{U) + R{PqAi^_^) by the projection update rule 
so that {R{U) + R{QQ*Ai^_J)^ = R{Q) n i?(QQ*A/,_J-L is the noise subspace for [U Ai^_^] or 
[B A,,_^. 

Appendix D: Proof of Theorem 5.1 
We first need to show the following lemmas. 

Lemma D.l: Assume that we have noisy measurement through multiple noisy snapshots where 

Y = AX + N, 

where A € R™^", X G M"'"', and N e M"'"' is additive noise. We also assume that h-r C suppX. Then 
there is a 77 > such that for any j ^ suppX and / G suppX, 

[Pr{Q) - ^R(Pb(q)A7,_jJ > ^1 [Pr(q) - -Pfl(Pfl«5)A7,_jJ a; (D.l) 

if II A^ll < T], where ||A^|| is a spectral norm of N and Q G ]^mx{m-k) consists of orthonormal columns such 
that Q*Y = 0. 

Proof: First, here we let B ~ AX, crmin(S) (or amax{B)) be the minimum (or the maximum) nonzero 
singular value of B. Then, Y = B + N is also of full column rank if ||A^|| < cTmin(-B). For such an A'', 

\\Pr(y) - Pr(b)\\ = \\Y{Y*Y)-^Y* - B{B*B)-^B*\\ 
= \\{B + N)[{B + N)*{B + N)r\B + N)* - B{B*By^B*\\ 

< \\N\\\\[{B + Ny{B + N)]-\B + Ny\\ 

+\\{B + N)[{B + Ny{B + N)]-^\\\\{B + Ny{B + N) - B*B\\\\{B*B)-'^B*\\ + \\B{B*B)-'^ 

< ||Ar||(B + 7V)t|| + ||(B + 7V)t||||Bt||[2||B||||7V|| + ||Ar||2] + ||Bt||||7V|| 

by the consecutive use of triangle inequality. If we have ||A''|| < crinin(-B), we get 

\\{B + Ny\\<{a^,4B)-\\N\\)-' 



so that 



Y-B\\ - arain{B) - \\N\\ a,„i„(B)((7n,in(B) - ||iV||) ' ^ a^i„(B) 

2(a^ax(S)+a„,i„(B)) 



(Tu.in{B){a^i^{B)-\\N\\) 



(D.2) 
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where we use ||-B^|| = l/(c7min(-B)) and = crmax(-B). By the projection update rule, we have 



Pr{B) + Pr(P^^^^A,^_^) - I- [Pr{Q) - -Pr(Pb(q)Ai^_ J 



and similarly, 



(D.3) 



Pnav - Pr(y) + PR{P^f^^Aj^_^) - I- [Pr(q) - PRiPH(Q)^'k-r) 



(D.4) 



By applying (D.3) and (D.4) as done in [50], we have 



Wi^RiQ) - PR(P^^Q^A,^_^)] - [Pr{Q) - PR(Pn(Q)Ai^ 



< \\Pr(y) - Pr{b)\ 



(D.5) 



Then, for any j ^ suppX and I G suppX, by the generalized MUSIC criterion (IV.3) we have 



Pr(Q)- PR(PRiQ)A,^_^) 

Pr(Q) - PR{Pn^Q)A,^_^) 



Pr(Q) - Pr(Priq)Ai^-J 

Pr(q) - PR{PR^Q~,AI^_^) 



a; 
a; 



\.Pr{Q) - PR(PR^Q)Alk-r)^ - [Pr{Q) - Pr{Pr(q)Ai^_^)\ 

[Pr(Q) - Pr{Pr(,q)Ai^-.)] - [Pr{Q) - Pr{Pr^q)Ai^_^)] 



a; 



> min a* 
:/^suppX 



> min a. 



j^suppX 



if we have 



Pr(Q) - Pr(Pr^q)A,^_,) 



Pr(Q) - PRiPR(Q)^Ik-r) 



< 



aj - 2max(||aj||^, ||ai||^)||PR(y) - Pr(b)|| 

a, 2maxJ|a,|| ^^^^^B){a^,^{B) - \\N\\) >° 



4(fTmax(-B) + Crmin(-B)) + C7min(S)C 



(D.6) 



where 



C:= 



j^suppJf 



Pr(Q) - PR(PRiQ)Ai^_^) 



max a 
l<j<n " • 



..||2 



Lemma D.2: Suppose a minimum SNR is given by 



SNR^i„(r) := ^^^^ > rj, 



where 



r? := 1 + 



H<B) + 1) 
C 
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k{B) is the condition number of B = AX and 



mm au 

j^suppX 



C:= 



Then, for any j ^ suppX and I S suppX, 



max a, 

l<j<n 



Pr{Q) - 'Pr(p„(q)A,,_j 



a;. 



Proof: Using (D.6), the generaUzed MUSIC correctly estimates the r remaining indices when 

0'min(-B)C 



< 



4(«(B) + 1)+C 



where we use the definition of the condition number of the i? = AX matrix, i.e. k{AX) = ^""^^^ j . This 
implies that 



SNR^i„(y) > 1 + 



4(^(i3) + 1) 
C 



This concludes the proof. ■ 
Corollary D.3: For a LSMMV(m, n, k, r; e), if we have Ik-r C suppX and a minimum SNR satisfies 



SNR^i„(F) > 1 + ^^^^^^ t ^ 1 + + 1) 

1 — 7 



(D.7) 



where 7 = lim„_>.oo \/k{n)/m{n), then we can find remaining r indices of suppX with generalized MUSIC 
criterion. 

Proof: It is enough to show that 

lim C,{n) = 1 — 7^. 

n— >oo 

First, for each 1 < j < n, m||aj|p is a chi-square random variable with degree of freedom m so that we 
have by Lennma 3 in [16], 



lim 



l<j<n 



m 



since lim„^(x) (log n)/m = 0. On the other hand, for any j ^ suppX, is independent from P^k^q) — 

Rj is a chi-square random variable with degree of 



PR{Pa^ci)Ai^_^) SO that ma* 



Pr{Q) - PR(Pn(Q)Aik-r) 



freedom m — k since — -Pr{P[hq-,a,^ ) is the projection operator onto the orthogonal complement 

of R[B A/j, _^]. Since lim„^oo(log (n — k))/{m — k) =Q, again by Lenraia 3 in [16], we have 



lim 



i^suppJC 



Pr{Q) - PR(PR^Q)AT^-r) 



m — k 



= 1 
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SO that 



iim 



max a 

i<j<n ■ 



J|2 



lim 



mm 

j^suppX 



TO 



m — k 



m — k 



max a, 

l<j<n 



TO 



1-7 < 1. 



Proof of Theorem 5.1: First, we need to show the left RIP condition < < 1 to apply the 

generaUzed MUSIC criterion. Using Marcenko-Pastur theorem [55], we have 



\\ms\xp52k-r+i = 1 - liminf(l - ^{2k-r + l)/mf < 1. 

Hence, we need to > (1 + 5){2k — r + 1) to make limsup^^,^ 52k-r+\ > 0- Second, we need to calculate 
the condition for the number of sensor elements for the SNR condition (D.7). Since 7 = lim^^oo \fkjm, 
we have (D.7) provided that 

TO>M1 + ^)^^ '^'^^^^^'^ 



1- 



SNR^i„(y) - 1 
Therefore, if we have SNRmi„(Y') > 1 + 4(k(B) + 1) and 

4(k(5) + 1) 



TO>ma.|Ml + ^)L^ SNR.,(y)-l 
then we can identify the remaining r indices of suppX. 



(l + <5)(2fc-r + l) 



Appendix E 



The following two lenmias are quite often used in this paper. 

Lemma E.l: Suppose that r is a given number, and is a set of i.i.d. chi-squared random variables 

with degree of freedom r. Then 

(n) 

u) 

lim max = 1 

n^oo j=i,--- ,n 2 logn 

in probabiUty. 

Proof: Assume that Zr is a chi-squared random variable of degree of r, then we have 

-(--^> = ^' 

where T{k,z) denotes the upper incomplete Gamma function. Then we use the following asymptotic behavior 



P{Zr > a;} ~ T^T^x'^/^-^e-^/^ as a; ^ 00. 
r(r/2) 
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For n oo, we consider the probability P{maxi<j<„ m^"^ > 2(1 + e) logn}. By using union bound, we 
see that 

,(") 



P { max u) ' > 2(1 + e) logn 

l<j<n 



as n ^ 00. Now, considering the probability P{maxi<j<„ u^"^ < 2(1 — e) logn}, we see that 



P \ max u'f^ < 2(1 + e) logn i 

[l<j<Tl J 

< (l - f(^(2(l - e)logn)'-/2-ie-(i-)'°s"y 



as n — )• oo so that the claim is proved. ■ 
Lemma E.2: Let A e M"*><" be the Gaussian sensing matrix whose components Uij are independent 
random variable with distribution J\f{0,l/m). Then 



lim '' , = 1. 

n—>-oo ||A 111^ 

Proof: Because Ojj J\f{0,l/m) for all 1 < i < m and 1 < j < n, m||aj||^ is a chi-squared random 
variable of degree of freedom m so that by Lemma 3 in [16], we have 

lim max Ha^f = lim min ||aj|p = 1. (E.2) 

n— >cx) l<j<n n—>oo l<j<n 

Since, for fixed 1 < j < n, ai{i j) is m-dimensional random vector that is nonzero with a probabiUty of 
1 and independent of a.j, the random variable u{i,j) = a|aj/||aj|| is a Gaussian random variable with a 
variance of 1/m by applying Lenama 2 in [16]. Since we have (E.2) and the variance of u{i,j) goes to 
as n — )• oo, 

lim a* a, = (E.3) 

n^oo 



for all 1 < j < fc < n. Then 

WAXWl _ tTace{X*A*AX) 
\\X\\% ~ tT&ce{X*X) 

as n — >■ 00. 



^ 1 



Appendix F: Proof of Theorem 5.2 
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Proof of Theorem 5.2: Let /( C suppX with |/t| — k — r, where It is constructed by the first k — r 
indices of X if we are ordering the values of ||x*|p for 1 < i < n with decreasing order. Then for i G It, 

a*B = ||ai||V+a|£;' 
where Ei = [e\,--- , e^] e R"^'' and = b; - a^Xi,,. Then 
a*BB*ai = ||||ai|| V + a*^!^ 

> |||a,|n|x^||-||a*£;^||r= VA,-, >>,^Z; (F.l) 

where 

A, = ||a,r||xl2, Bl = ||a,f llejf , = 



First, by Lemma 3 in [16], lim„_>cx) supjgj^ ll^ilP = 1 so that we have 

Ai 



lim inf ■ 



lim inf r ***** **** r 

n^oo rMSR MSR r 

' ^ min ^ min 



> 1 



by the definition of MSR„,:J and the construction of /, . 

111 111 

For Bl, observe that each is a Gaussian m-dimensional vector with total variance 



(F.2) 



:= E[\\eir]<E[\\hir] = \\^ir 

and (m/VJ*)||ej|p is a chi-squared distribution with a degree of freedom m fox i € It. Hence using Lemma 
3 in [16] and 



log (fc - r) ^ logm ^ ^ 



m 



m 



as n oo so that 



so that we have 



lim sup max — - — -r^ — < lim sup max — — < 1 



lim sup max - 



B] 



^eh Xi 



< 1 



for i G It and 1 < Z < r. Finally, 



i^h X; 



|a.|P||el|P 



(F.3) 



follows beta distribution Bcta(l,TO — 1) as shown in [16]. Since there are fc — r terms in It, Lenmia 6 in 
[16] and inequality (F.3) shows that 



lim sup max — Zf < 1 

n^oo i^it 2 log (k — r) 
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SO that we have 

r 

l°^r?e^2rlogTfc-r) tiller ^ g\"-.^^P?e^f ^ 21og(I - r)^' ^ ^^"^^ 
For i ^ suppX, we have 

r 

\\^*Bf = a:SB*a,=5^af(B)||a>Hl' 

r r 

1=1 ;=i 

where B = UYiV is the singular value decompostion of B,U = [ui , ■ • • . u;] and S = diag[(Ti (S), • • • , ar{B)] 
where (Ti{B) > • • • > ar{B) — (Tmin(S) > 0. As will be shown later, the decomposition in the second line 
of the above equation is necessary to deal with different asymptotic behavior of chi-square random variable 
of degree of freedom 1 and r. Since is statistically independent from {u;}[^j^ for i ^ suppX and {u;}[^j^ 
is an orthonormal set, 'Ti||a*u;|p is a chi-squared random variable of degree of freedom r and each 

m||a*u;|| is a chi-squared random variable of degree of freedom 1. Also, we have 



E(<7f(B)-a^i„(B))m||a*u;f 



limsup — — < limsup a^^„(B) (F.5) 

n^oo 2rlog((n-fc)r) " „^cx> ' " ™^ ^' 



|2 



Since 



m\\ajUi 



|2 



limsup max — j-^ — < 1 

n->-oo iisuppX,l<l<r 2 log ((n - fcjrj 

by Lemma 4 in [16]. When r is a fixed number, then by Lenmia E.l, we have 



E m\\a.*ui 



|2 



lim max — — - = 1. (F.6) 

n->-oo j"=l,--- ,n 2 log (n — fc) 

On the other hand, when r is proportionally increasing with respect to k [16], then we have 



Em||a*u;||2 

max 

Combining (F.6), (F.7) and (F.5), we have 



lim max — = 1. (F7) 



limsupM^<l (K8) 

„-j.oo 2B(n,k,r) 

for j ^ suppX, when B{n, k, r) is given by (V.5). 
For the noisy measurement Y, we have for all I < j < n, 

limsup I ||a*S||2 - ||a*F||2| < limsup ||a,||2(2||B||||iV|| + ||iV||2) = (2||B|| + ||iV||)||iV||. (F.9) 
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Let 

A:=MSR^:, ^:=^21og(/e-r) 

2(2||g|| + ||Ar||)||Ar|| ^ ^_ 2B{n,k,r) _ 
r ' r 

Then, for i e suppX, combining (F.l), (F.2), (F.4) and (F.9), we have 

l^^-^^i ^htYf-^inB\\ + \\N\\)\\N\\ ^ ^.^.^^m||a*S||-2m(2||B|| + ||7V||)||Ar|| 



n->-oo 

> lim inf ^ <- . 

n— >oo c 

On the other hand, using (F.8) and (F.9), for j ^ suppX we have 

m||a*y|i2-m(2||B|| + ||7V||)||7V|l m||a*B||2 
hmsup — — — L_U_ii y — liiii — II < hmsup — " ' < 1 

n— >oo n— >oo 

SO that we need to show that 

lim inf ^ > 1 + ^ (F.IO) 

under the condition (V.3) and (V.4). First, note that A > 2^ if and only if 

rMSR(^:>2(2||i3|| + ||7V||)||^||. 

which is equivalent to that 

2 ,R\" SNRLn(y) - 4«(i?)SNR„i„(y) - 2 > 

which holds under the condition (V.3), where we used the definition k;(B) := ||-B||/criniii(-B) = (7max(-B)/crmin(-B). 
Then we can see that if we have \/m > r/^ ^ , then 

~ \J\—\/v 

{VXV^ - - lym > [{VX - V^)V^ - . (F.ll) 

Also, if we have 



m> 



then 



^ _ - . S. (F.12) 



Hence, by applying (F.ll) and (F.12), if we assume the condition 



then the inequality (F.IO) holds so that we can identify It C suppX by 2-thresholding. 
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Appendix G: Proof of Theorem 5.3 and 5.4 

In this section, we assume the large system limit such that p, e, a and 7 exist. We first need to have the 
following results. 

Theorem G.l: [55] Suppose that each entry of A s M'"^*'^ is generated from i.i.d. Gaussian random 
variable A/'(0, 1/m). Then the probability density of squared singular value of A is given by 



^ ^^^tweEEHEjH (0.1) 



where 7 = lim„^(x) \fkjm. 

Corollary G.2: Suppose that each entry of A e j^mxfe generated from i.i.d. Gaussian random variable 
^(0, 1/m). Then the probabiUty density of singular value of A is given by 

^jWTWE^MHEm. ,G.2, 

Proof: This is obtained from Theorem G.l using a simple change of variable. ■ 
Lemma G.3: Let r < fc < m be positive integers and A e R™^''. Then for any r-dimensional subspace 
W of R{A), we have 

\\A*Pw\\%>i2''l-J+i(^) 

where ai{A) > (72{A) > ■ ■ ■ > ak{A) > 0. 

Proof: Let A* = UT,V* be the extended singular value decomposition of A* where 

S = diag[(Ti,(T2, • • • jCTm], 
V = [V1,V2,--- ,Vm] 

and CTfe+i = ak+2 = ■■ ■ = am = 0. If we let Z = V*Pw, then we have 

\\Zfp = ti:&ce{PwVV* Pw) = trace(Pw) = r (G.3) 

and 

\\A*Pw\\l = \\A*VZ\\l = \\UtZ\\l 
If we let Z = [zj , • • • , z;^]*, since is a subspace of R{A) and R{A) = N{A*)-^, we have 

Zfe+l = = • • • = Zm = 



and 



k 

1=1 
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by (G.3). Since < ||zj||2 < 1 for 1 < j < fc, using ai{A) > a2{A) >■■■> ak{A), we have 

\\A*Pw\\% = \\UtZ\\% = j2<^f{A)\\zif >j2'^l-j+iiA). 

1=1 j=i 

Lemma G.4: For < 7 < 1 and < a < 1, we let < t^{a) < 1 which satisfies J^ij^T+27*-y(«) (is^(^j.) 
a where dsj{x) is the probabiUty measure which is given by 

ds (x) := 1 v{{i+7r-x^){x^-{i-^r) _ 

' 777^ X 

Then we have for any < 7 < 1, t^(a) > ti{a). 

Proof: It is sufficient to show that for any Q <t <1 and < 7 < 1, 

"1-7+27* f2t 



/ dsj{x) < / dsi{x 

Jl-j Jo 



) (G.4) 



-7 

By substituting s = (a; — (1 — 7))/7, we have 

fl-7+27t p2t 



/ ds-y{x) = / dso^-y{x) 

Jl-'y Jo 



where 

, . , ^/i^/2^Vs + + 2(1 - 7)77 

7r(s + (l-7)/7) 

By Lemma G.5, there is only one root for dso^-y{x) = dsi{x) in (0, 2) and rfsi(O) > dso,^(0). Then there 
is some s* e (0, 2) such that dsi{x) > dso,-y{x) for a; < s, and dsi{x) < dso,-y{x) for a; > s* so that 

/ dsi{x) > / dso,7(a;) for < 2i < s» 
Jo Jo 

and /q^* dsi{x) — Jq* dso,j{x) is a decreasing function on (s*, 2) such that 



/ dsi{x) = / dso,'y{x) = 1. 
io Jo 



Hence, for any t € (0, 1), 

r2t r^t 



/ dsi{x) > / rfso,7(a;) 
Jo Jo 



so that (G.4) holds. ■ 
Lemma G.5: Let ds\{x) and ^50,7(2;) be probabiUty density functions with support [0,2]. Then these 
probabiUty density functions have only 1 intersection point in (0,2). 
Proof: For s e (0, 2) 



^V2^^s + 2/7v/s + 2(1 - 7)/7 
(s + (1 - 7)/7) 



,2 
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if and only if 



s{s + 2/r){s + 2(1 - 7)/7) = (s + 2)(s + (1 - 



Expanding both sides, we have 

(2/7 - 2y + (4(1 - 7)/7' - (1 - ifh' - 4(1 - l)h)s - 4(1 - 7)77^ = 
so that there is only 1 positive root. If we assume that dso,j{x) and dsi{x) have no intersection point, then 

/ dsi{x) > / dso^^{x) 
Jo Jo 

since dsi{0) > dso,'y{0). This is a contradiction so that there must be 1 root for dsQ^^{x) = dsi{x) in (0, 2). 

■ 

Proof of Theorem 5.3 and 5.4: Note that S-OMP can find k — r correct indices from suppX if we have 
max \\a*jPii,p^ ^JP > max lla^J-PRfp-L sill^ (G.5) 

for each < t < fc — r, since ||a*P;^fpj_ = for j e suppX n /(. Hence, it is enough to check 

that the condition (G.5) for <t < k - r. 

First, for j ^ suppX, since is statistically independent of P^(j^^ ^Y. For t < k — r, the dimension of 
^R(Ai )^ ^ ^\\^^^R{PBiA °^ chi-squared distribution of degree of freedom r. 

On the other hand, for j e suppX, we have 



k 



by using R{P^f^Aj c R{As) and Lemma G.3, where As have singular values < (Tk{As) < 
<^k-i{As) <■■■< (Ti{As). If we let 



then by (G.l), we have 



:= 1 v/((l+7)^-a;^)(a^^-(l-7)^) 



n— >oo 

where < tj{a) < 1 is the value satisfying 

r-l-7+27t^(a) 
'1-7 



E^fe-j + l(^s) (i_^+2^^(a))2 

lim = / xdXj{x) (G.6) 



(1-7)2 



/■l-7+27^(a) ^ 

/ (is-.(x) = a = lim — 

Jl-7 K 
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Using Lemma G.4, we have ^(q;) > ti{a) for all < a < 1 and < 7 < 1. Using this and (G.6), we have 



/' 



(l-7+27^(a))^ 



> 



> 



(1-7) = 
(l-7+27ti(a))^ 

(1-7) 
4ti(a) 



xdXj{x) 



xdXy (x) 



/■4tl^a;- 

/ [(l-^)2+^s]rfAl(s) 

Jo 

f 

Jo 



= (1-7)2^ + 7 / sdXi{s) 



(G.7) 



where we used the integral by substitution with s = {x — {1 — 7)^)/7 and the inequaUty (1 — 7 + 
(1 - 7)^ > 47*2 for < t < 1. Substituting (G.7) into (G.6), we have 



liniinf max II a* P^(P^ „)f > liminf 



k 



> a 



(1-7)2+7 



a 



= lim —{l/j-iy + ajF{a) 

n— >oo m 



(G.8) 



where F(a) := (l/a) J^*'^"^' sdXi{s) is an increasing function with respect to a such that lima_,.o F[a) 
and = 1, and 07^ = (lim„_5.co 'r/k){\imn-^co k/m) = lim„_>.oo r/m. 
For noisy measurement F, we have the following inequality: 



< (IIPr^p^^^^ ^^,a,|| + ||P; 



- 2||aj||||Pfl(p^^^^^^y)a,- - P^(p^^^^^^B)a,|| 

< 2||aj||||PR(y)aj - PR(B)aj|| < 2||aj||2||PR(y) - Pr(b)\ 



2\\Pu^Y) - PRiB)\\ < 



(i?))||iV|| 4(«(B) + 1) 



(G.9) 



(7min{B){a^i4B) - \\N\\) SNR^i„(B) - 1 

as rn- 00, where SNRmin(-B) = (Tmin(-B)/||A''||. 

Then we consider two Umiting cases according to the number of measurement vectors. 
(Case 1 : Theorem 5.3) For t < k — r, {m||a^PR(pi^^ )B)II^ • j ^ suppX} are independent chi-squared 
random variables of degree of freedom r so that by Lenoma E.l, we have 



lim max „ , . , . 
n->-oo j^suppX 2 log (n — k) 



= 1. 



(G.IO) 
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Here we assume that 



and 



SNR^i„(r) >1 + 4-{k{B) + 1) 
r 



m> k 



4:k k{B) + 1 



2(1 + 6) 



log (n — k) 



r SNR„,in- 1_ 
Then by Marcenko-Pastur theorem [55], 

lim (Tmin(As) = lim (1 - ^/k/^f > lim fl - v/r/(21og {n-k))Y = 1 

n— foo n— fCJO n—^oo \ / 

SO that 



(G.ll) 



(G.12) 



m||a*P, 



lim inf max ■ „ , . , x 

n^oo jesuppx 2 log (n — k) 



> lim inf 



m 



n->-cx) 2 log (n — fc) fc 

2 



> lim inf ■ 



n->-oo 21og(n — fc) \7 
Combining (G.9) and (G.13), for noisy measurement Y, we have 



(G.13) 



lim inf max 



n^oo jesuppX 2 log (n — k) 



m 



> lim inf ■ 

n— >oo 

> lim inf 



2 log (n - /c) 



n-s-oo 2 log (n — A;) 



4fc k{B) + 1 



7" 



> 1 + 5 



r SNR^in(F)-l. 

if we have (G.12). Hence, when r is a fixed number, if we have (G.12), then we can identify k — r correct 
indices of suppX with subspace S-OMP in LSMMV. 

(Case 2: Theorem 5.4) Similarly as in the previous case, for t < fc — r, {m\\a* Pj^(^p±^^ )B)IP - J ^ 
suppX} are independent chi-squared distribution. Since lim„_>.oo (log n)/r = 0, by Lemma 3 in [16], we 
have 



lim max 

rn-C!0 j^suppX r 



1. 



(G.14) 



By using (G.9), we have 



lim inf max 

n-s-cx) jesuppX 



m\\auPii,p± Y)\ 



> 



1 +F{a) 



1 _ 4 k{B) + 1 1 
7 " aSNR^i„(B)-l^- 



(G.15) 



We let 



SNR^i„(B) >1 + -{k{B) + 1) 
a 



(G.16) 
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and 



m>k{l + 6f :.[2-F(a)f (G.17) 

_ 4 («(B) + 1) ' 



■'■ a SNR„i„-l 



for some 6 > 0. Note that (G.17) is equivalent to 

' a SNRmi„-l 

Again we let 

u := F{a) and u := 



aSNR^i„(B)-l- 

Then for a quadratic function Q(x) = {x — 1)^ + ua; — vx"^, if a; > (1 + 5)(2 — — v), then we have 

2-u " 



= (1 - v)x'' - {2 - u)x + 1 = {1 - v)x 



X — 



{1-v) 



+ 1 (G.18) 



> <5(l + <5)^^— ^ + 1> l + (5(l + (5) (G.19) 

since 1 — w > by (G.16) and < u < 1. Combining (G.15) and (G.18), we have for < t < k — r and 
j G suppX, we have 

liminf max > 1 + ^(1 + 5) 

n— >oo jGsuppX r 

for some 6 > 0. Hence, in the case of lim„^oo f/k = a > 0, we can identify the correct indices of suppX 
if we have (G.17). 
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Fig. 2. Phase transition map for compressive MUSIC with subspace S-OMP when n = 200, SNR = oo, ||x'|p is constant for all 
j = 1, ■ ■ • , n, and (a) r = 3, and (b) r = 16. The overlayed curves are calculated based on (VI. 1). 





Fig. 4. Recovery rates for various m and k when SNR =40dB and non-zero rows of ||x' || are constant for all i. Each row (from top 
to bottom) indicates the recovery rates by S-OMP, 2-thresholding, and compressive MUSIC with subspace S-OMP and 2-thresholding. 
Each column (from left to right) indicates r = 1, 8 and 16, respectively. 
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(a) (b) 



Fig. 5. Recovery rates by various MMV algorithms for a uniform source when n = 200, m = 20, r = 8, and 16 and SNR =40dB: 
(a) recovei'y rate for S-OMP, compressive MUSIC with subspace S-OMP, and mixed norm approach when p = 2,q = 1 and (b) 
recovery rate for 2-tliresholding and compressive MUSIC with 2-thi'esholding. 




Fig. 6. Recovery rates by compressive MUSIC when k — r nonzero supports are estimated by (a) an "oracle" algorithm, (b) subspace 
S-OMP, and (c) 2-thresholding. Here, X is given with t = 0.9, r = 0.7 and r = 0.5. Smaller t provides larger condition number 
k(X). The measurements are corrupted by additive Gaussian noise of SNR =40dB and n = 200, m = 20, r = 8. 




Fig. 7. Cost function for sparsity estimation when n = 200, m = 40, r = 5, k = 10, and the measurements are (a) noiseless and 
(b) corrupted by additive Gaussian noise of SNR =40dB. The circles illustrate the local minima, whose position corresponds to the 
true sparsity level. 
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Fig. 8. Rate regions for tlie multiple measurement vector problem and CS-MUSIC, when (a) r is a fixed number, and (b) 
lim„_^cx3 r/k = a > 0. 



